Skip to content

[25.1] Use workflow-style payload in data landing request#21107

Merged
mvdbeek merged 2 commits intogalaxyproject:release_25.1from
mvdbeek:landing_request_api_tweaks
Oct 21, 2025
Merged

[25.1] Use workflow-style payload in data landing request#21107
mvdbeek merged 2 commits intogalaxyproject:release_25.1from
mvdbeek:landing_request_api_tweaks

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented Oct 19, 2025

Makes it considerably easier for sites that support data landing and workflow requests, and it's also easier to read and validate.

Closes #21097

Hoping we can get this into 25.1 since it's new functionality that is not yet out in the wild.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jmchilton
Copy link
Copy Markdown
Member

We have an upload API and we have an API for supplying files to a workflow form - in the abstract you don't think the data landing API should reflect the upload API instead over the workflow run API ? That feels like you're letting your current use case override the best design in the abstract. You're doing the integration work and I'll let you make this change and merge it into 25.1 but I don't like it in the abstract - it makes little sense to me we wouldn't model the uploads by default for this API. Is having an request type parameter we could add to both this API and to the data fetch API to allow both of them to be compatible with the other be a compromise that we could both be happy with? If we're saying the workflow form representation is the best - we should have an interface to upload directly with it I would think.

Copy link
Copy Markdown
Member

@jmchilton jmchilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said if you're doing the integration work - I'm not going to get in the way but I don't love it.

@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Oct 20, 2025

you don't think the data landing API should reflect the upload API instead over the workflow run AP

I think we should have just one documented API and the workflow run API is, while less comprehensive, subjectively easier to write and easier to document and validate (that one I think is less subjective?).

your current use case override the best design in the abstract.

That is where I am coming from, however I don't think that we're preventing any of the advanced features to be implemented with the simpler request API? While we aren't supporting attaching files we shouldn't advertise them in the API though, which we currently do.

Is having an request type parameter we could add to both this API and to the data fetch API to allow both of them to be compatible with the other be a compromise that we could both be happy with?

A separate route for each sound good to me, yes. I do think we want the upload API to also work with the simpler request syntax, but that's probably more work than I would do right now ?

@mvdbeek mvdbeek force-pushed the landing_request_api_tweaks branch from 1de38e5 to a2d1851 Compare October 20, 2025 07:48
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Oct 20, 2025

that one I think is less subjective

I guess I should elaborate, this is the fetch data API:

- destination: {type: "hdca"}
  collection_type: list
  name: "my collection"
  items:
  - src: url
    name: "sample1"
    url: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"
  - src: url
    name: sample2
    url: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"
  - src: url
    name: sample3
    url: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"

This is the workflow run / file / class API:

- class: Collection
  collection_type: list
  name: "my collection"
  elements:
  - class: File
    name: sample1
    location: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"
  - class: File
    name: sample2
    location: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"
  - class: File
    name: sample3
    location: "base64://eyJ0c3JjIjogInRlc3QifQ=="
    ext: "txt"

if you squint they look similar, but the validation works quite differently. The class attribute lets us us a discriminated union, so we don't have to ask which of our entries matches the schema, and if none matches, we'll have to report all mismatches. On top of that we have to have a model validator (or manual validation), because destination limits the top level items.
Here's the validation error for a typo in extension with the file API:

- '{"type": "missing", "loc": ["body", "request_state", 0, "Collection", "elements",
  0, "File", "ext"], "msg": "Field required", "input": {"class": "File", "name": "Reference
  information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==", "extensionn": "txt"}}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", 0, "Collection", "elements",
  0, "File", "extensionn"], "msg": "Extra inputs are not permitted", "input": "txt"}'

kind of readable, right ?

this is the same typo with the current API:

- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "DataElementsTarget",
  "items", 0, "tagged-union[FileDataElement,PastedDataElement,UrlDataElement,PathDataElement,ServerDirElement,FtpImportElement,CompositeDataElement]",
  "url", "extensionn"], "msg": "Extra inputs are not permitted", "input": "txt"}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "DataElementsTarget",
  "items", 0, "NestedElement", "elements"], "msg": "Field required", "input": {"src":
  "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "DataElementsTarget",
  "items", 0, "NestedElement", "src"], "msg": "Extra inputs are not permitted", "input":
  "url"}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "DataElementsTarget",
  "items", 0, "NestedElement", "url"], "msg": "Extra inputs are not permitted", "input":
  "base64://eyJ0ZXN0IjogInRlc3QifQ=="}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "DataElementsTarget",
  "items", 0, "NestedElement", "extensionn"], "msg": "Extra inputs are not permitted",
  "input": "txt"}'
- '{"type": "literal_error", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "destination", "type"], "msg": "Input should be ''hdca''", "input": "hdas"}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "items", 0, "tagged-union[FileDataElement,PastedDataElement,UrlDataElement,PathDataElement,ServerDirElement,FtpImportElement,CompositeDataElement]",
  "url", "extensionn"], "msg": "Extra inputs are not permitted", "input": "txt"}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "items", 0, "NestedElement", "elements"], "msg": "Field required", "input": {"src":
  "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "items", 0, "NestedElement", "src"], "msg": "Extra inputs are not permitted", "input":
  "url"}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "items", 0, "NestedElement", "url"], "msg": "Extra inputs are not permitted", "input":
  "base64://eyJ0ZXN0IjogInRlc3QifQ=="}'
- '{"type": "extra_forbidden", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsTarget",
  "items", 0, "NestedElement", "extensionn"], "msg": "Extra inputs are not permitted",
  "input": "txt"}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "DataElementsFromTarget",
  "src"], "msg": "Field required", "input": {"destination": {"type": "hdas"}, "items":
  [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "DataElementsFromTarget",
  "elements_from"], "msg": "Field required", "input": {"destination": {"type": "hdas"},
  "items": [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsFromTarget",
  "src"], "msg": "Field required", "input": {"destination": {"type": "hdas"}, "items":
  [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}'
- '{"type": "literal_error", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsFromTarget",
  "destination", "type"], "msg": "Input should be ''hdca''", "input": "hdas"}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "HdcaDataItemsFromTarget",
  "items_from"], "msg": "Field required", "input": {"destination": {"type": "hdas"},
  "items": [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}'
- '{"type": "literal_error", "loc": ["body", "request_state", "targets", 0, "FtpImportTarget",
  "destination", "type"], "msg": "Input should be ''hdca''", "input": "hdas"}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "FtpImportTarget",
  "src"], "msg": "Field required", "input": {"destination": {"type": "hdas"}, "items":
  [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}'
- '{"type": "missing", "loc": ["body", "request_state", "targets", 0, "FtpImportTarget",
  "ftp_path"], "msg": "Field required", "input": {"destination": {"type": "hdas"},
  "items": [{"src": "url", "name": "Reference information", "url": "base64://eyJ0ZXN0IjogInRlc3QifQ==",
  "extensionn": "txt"}]}}

this is getting worse with more complex collections, and it's not just the validation errors, this translates into the schema we could provide for people to author payloads etc.

@jmchilton
Copy link
Copy Markdown
Member

The way Pydantic displays errors should not be relevant to how we design the API IMO. Also the data fetch schema allows creation of libraries and library folders and reading things from different sources... none of those things have direct equivalents in the workflow run API right?

I remain unconvinced this isn't an over correction and over specification to a more limited API - I also remain unconvinced the data fetch schema is inferior. I do however believe that both the data fetch API and the data landing API should support workflow-run style uploads - that just makes sense to me and would enable this use case. I would just add a payload_type property on the those APIs to implement it instead of duplicating the APIs.

@ahmedhamidawan ahmedhamidawan modified the milestones: 26.0, 25.1 Oct 20, 2025
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Oct 21, 2025

The way Pydantic displays errors should not be relevant to how we design the API IMO.

Of course, however usability should matter. I used this to illustrate 2 issues with the current syntax:

  • the "target" we're creating (hda, hdca, ldda etc) is somewhat detached from the thing we want to put there
  • there's no consistent discriminator for the elements we can include

which is then why pydantic has to validate everything that thing could be, and I think that's a design issue.
But it's not just pydantic, any implementer has to mentally do the same effort, you could also look at how one would have to use /api/data_landings with our frontend fetcher setup (maybe not the best point since it's generated via our pydantic models 😅).

creation of libraries and library folders

class: Library and class: LibaryFolder would be how i'd implement this, and then I know I have to provide what satisfies LibraryDataset, and vice verse I know I can't provide collection metadata.

reading things from different sources.

like item_from ? this seems doable also with the run API

I would just add a payload_type property on the those APIs to implement it instead of duplicating the APIs.

I've added api/file_landings and restored the previous version of api/data_landings, does that work for you ?

@mvdbeek mvdbeek force-pushed the landing_request_api_tweaks branch 2 times, most recently from 661bd62 to 4672dd0 Compare October 21, 2025 08:50
Makes it considerably easier for sites that support data landing and
workflow requests, and it's also easier to read and validate.

Closes galaxyproject#21097
@mvdbeek mvdbeek force-pushed the landing_request_api_tweaks branch from 4672dd0 to 5752c17 Compare October 21, 2025 14:53
@mvdbeek mvdbeek merged commit ba446cb into galaxyproject:release_25.1 Oct 21, 2025
52 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants